NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Design2Code: Benchmarking Multimodal Code Generation for Automated Front-End Engineering

Si, Chenglei; Zhang, Yanzhe; Li, Ryan; Yang, Zhengyuan; Liu, Ruibo; Yang, Diyi (April 2025, Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers))

Free, publicly-accessible full text available April 1, 2026
Grounding-Tracking-Integration

https://doi.org/10.1109/TCSVT.2020.3038720

Yang, Zhengyuan; Kumar, Tushar; Chen, Tianlang; Su, Jingsong; Luo, Jiebo (September 2021, IEEE Transactions on Circuits and Systems for Video Technology)
null (Ed.)
Full Text Available
SAT: 2D Semantics Assisted Training for 3D Visual Grounding

Yang, Zhengyuan; Zhang, Songyang; Wang, Liwei; Luo, Jiebo (January 2021, International Conference on Computer Vision)

Full Text Available
Improving One-stage Visual Grounding by Recursive Sub-query Construction

https://doi.org/10.1007/978-3-030-58568-6_23

Yang, Zhengyuan; Chen, Tianlang; Wang, Liwei; Luo, Jiebo (January 2020, European Conference on Computer Vision)
null (Ed.)
Full Text Available
HUMAN-CENTERED EMOTION RECOGNITION IN ANIMATED GIFS

Yang, Zhengyuan; Zhang, Yixuan; Luo, Jiebo (January 2019, Proceedings of IEEE International Conference on Multimedia and Expo)

As an intuitive way of expression emotion, the animated Graphical Interchange Format (GIF) images have been widely used on social media. Most previous studies on automated GIF emotion recognition fail to effectively utilize GIF’s unique properties, and this potentially limits the recognition performance. In this study, we demonstrate the importance of human related information in GIFs and conduct humancentered GIF emotion recognition with a proposed Keypoint Attended Visual Attention Network (KAVAN). The framework consists of a facial attention module and a hierarchical segment temporal module. The facial attention module exploits the strong relationship between GIF contents and human characters, and extracts frame-level visual feature with a focus on human faces. The Hierarchical Segment LSTM (HSLSTM) module is then proposed to better learn global GIF representations. Our proposed framework outperforms the state-of-the-art on the MIT GIFGIF dataset. Furthermore, the facial attention module provides reliable facial region mask predictions, which improves the model’s interpretability.
more » « less
Full Text Available
Attentive Relational Networks for Mapping Images to Scene Graphs

Qi, Mengshi; Li, Weijian Li; Yang, Zhengyuan; Wang, Yunhong; Luo, Jiebo (January 2019, Proceedings of IEEE Conference on Computer Vision and Pattern Recognition)

Scene graph generation refers to the task of automatically mapping an image into a semantic structural graph, which requires correctly labeling each extracted object and their interaction relationships. Despite the recent success in object detection using deep learning techniques, inferring complex contextual relationships and structured graph representations from visual data remains a challenging topic. In this study, we propose a novel Attentive Relational Network that consists of two key modules with an object detection backbone to approach this problem. The first module is a semantic transformation module utilized to capture semantic embedded relation features, by translating visual features and linguistic features into a common semantic space. The other module is a graph self-attention module introduced to embed a joint graph representation through assigning various importance weights to neighboring nodes. Finally, accurate scene graphs are produced by the relation inference module to recognize all entities and the corresponding relations. We evaluate our proposed method on the widely-adopted Visual Genome dataset, and the results demonstrate the effectiveness and superiority of our model.
more » « less
Full Text Available
Action Recognition with Spatio-Temporal Visual Attention on Skeleton Image Sequences

https://doi.org/10.1109/TCSVT.2018.2864148

Yang, Zhengyuan; Li, Yuncheng; Yang, Jianchao; Luo, Jiebo (August 2018, IEEE Transactions on Circuits and Systems for Video Technology)

Full Text Available

Search for: All records